Skip to content

feat(ci): enforce reference.conf CI check#6795

Open
bladehan1 wants to merge 2 commits into
tronprotocol:release_v4.8.2from
bladehan1:ci/reference-conf-gate
Open

feat(ci): enforce reference.conf CI check#6795
bladehan1 wants to merge 2 commits into
tronprotocol:release_v4.8.2from
bladehan1:ci/reference-conf-gate

Conversation

@bladehan1
Copy link
Copy Markdown
Collaborator

@bladehan1 bladehan1 commented May 21, 2026

What does this PR do?

Adds a CI gate that validates common/src/main/resources/reference.conf on every PR / push, enforcing four rules:

  • lowerCamelCase keys — every dot-separated segment of every key must match ^[a-z][a-zA-Z0-9]*$: starts with a lowercase ASCII letter, then ASCII letters/digits only. Acronym runs after position 1 (e.g. httpPBFTEnable, openHistoryQueryWhenLiteFN, allowShieldedTRC20Transaction) are accepted — only the first character is constrained, which matches what java.beans.Introspector / ConfigBeanFactory actually require for bean-property auto-binding.
  • Depth ≤ 5 — set exactly at the current max in tree (no buffer; new deeper keys must explicitly bump MAX_DEPTH via a reviewed change). Array steps count toward depth (each [] = +1 level); nested arrays (HOCON list-of-list) are supported.
  • Service-port uniqueness — leaves whose last segment is port or ends in Port must bind distinct integer values. Sentinels 0 (disabled) and -1 (auto/unset) are exempt. Paths containing [] are excluded — per-record fields (e.g. genesis-witness ports), not local-process bindings. Added in response to review feedback to prevent silent port-collision regressions.
  • ALLOWLIST escape hatch — seven legacy keys grandfathered for the camelCase rule (4 PBFT + 3 node.shutdown.*).

Implementation: .github/scripts/check_reference_conf.py using pyhocon (pinned via .github/scripts/requirements.txt), wired into the existing checkstyle job in .github/workflows/pr-check.yml. actions/setup-python enables pip caching keyed on the requirements file. Violations produce per-line stdout output plus one consolidated ::error file=...,title=reference.conf::... GHA annotation so failures show up in the PR check summary.

Why are these changes required?

reference.conf is the single source of default values for every config key in java-tron, and key names must auto-bind to XxxConfig.java bean fields via Typesafe Config's ConfigBeanFactory. This gate is stricter than typical Java/Spring projects (Spring Boot's Binder / Quarkus / Micronaut tolerate kebab-case, snake_case, UPPER_CASE, and camelCase as equivalent forms) because ConfigBeanFactory uses java.beans.Introspector and requires the HOCON key to match the JavaBean property name exactly — no case normalization. Without a CI gate, non-conforming keys silently fail to bind and accumulate.

A depth ceiling also prevents deeply nested hierarchies from creeping in. The gate is set exactly at the current max with no buffer: silent drift is unacceptable for a mature project, so any new key that needs to go deeper has to bump MAX_DEPTH explicitly and be reviewed.

Service-port collisions are a hard startup failure with a confusing error message at runtime; the uniqueness check surfaces them at PR time instead.

This PR is the release_v4.8.2 counterpart of #6792 (which targets develop), so the same gate runs on hotfix PRs against the release branch.

This PR has been tested by:

  • Manual Testing: ran the script against release_v4.8.2's reference.conf — passes with 296 keys, max depth 5, all service ports unique.
  • Local fixture suite (not committed) covering: empty / simple / dotted-form / at-max-depth / object array / scalar array / allowlist / format violations (Pascal / snake / digit / array-internal / non-allowlisted acronym) / depth violations (dotted + in-array) / port collision (same value / sentinel-exempt / array-path-exempt) / combined + dedup invariant / env errors (parse failure, missing file). All assertions PASS.
  • Verified: parse failure (invalid HOCON) and missing file both exit 2 with ::error annotation; a single user-declared deep key produces exactly one depth violation line (leaf-only filtering).

Follow up

Extra details

  • Allowlist exempts 7 legacy keys to keep user config.conf compatible: node.http.PBFTEnable, node.http.PBFTPort, node.rpc.PBFTEnable, node.rpc.PBFTPort, plus node.shutdown.BlockTime/BlockHeight/BlockCount (PascalCase exceptions handled manually in NodeConfig.fromConfig; currently commented out in reference.conf — pre-listed so the gate stays green if ever uncommented with defaults).
  • Library choice: pyhocon over Typesafe Config because the gate runs before Gradle/JDK setup in CI (~3 steps: setup-python + pip install + invoke). Both implement the HOCON spec; outputs are equivalent.

@barbatos2011
Copy link
Copy Markdown
Contributor

Solid CI gate, with two small items to address in this PR plus a recommendation on how to land the broader gate strategy.

Two issues to address in this PR

1. node.shutdown.BlockTime / BlockHeight / BlockCount should be in ALLOWLIST

PR #6790's docs/configuration-conventions.md explicitly lists these as a third documented PascalCase exception (alongside the PBFT* family and isOpenFullTcpDisconnect), handled via manual reads in NodeConfig.fromConfig. They pass the gate today only because they're commented out in reference.conf. If anyone ever uncomments them with default values (e.g. a future "graceful shutdown defaults" PR), CI rejects a legitimate documented exception.

Suggested addition:

ALLOWLIST = {
    "node.http.PBFTEnable",
    "node.http.PBFTPort",
    "node.rpc.PBFTEnable",
    "node.rpc.PBFTPort",
    # node.shutdown.BlockTime/BlockHeight/BlockCount: PascalCase exceptions
    # documented in configuration-conventions.md, read manually in
    # NodeConfig.fromConfig. Currently commented out in reference.conf;
    # listed here to prevent CI breakage if ever uncommented with defaults.
    "node.shutdown.BlockTime",
    "node.shutdown.BlockHeight",
    "node.shutdown.BlockCount",
}

2. Regex description vs actual behavior

The regex ^[a-z][a-zA-Z0-9]*$ is described as enforcing "lowerCamelCase," but it actually accepts acronym runs anywhere after position 1 — e.g. httpPBFTEnable, pBFTExpireNum, openHistoryQueryWhenLiteFN, allowShieldedTRC20Transaction all pass (and all exist in current reference.conf).

This is a conscious choice — those names work with ConfigBeanFactory because they start lowercase, which is the actual constraint — but the script docstring / PR description should be precise so reviewers don't later challenge them as violations. Suggested wording:

"Each segment must start with a lowercase ASCII letter and contain only ASCII letters/digits. Acronym runs after position 1 (e.g. httpPBFTEnable) are accepted: this matches what java.beans.Introspector tolerates, which is what ConfigBeanFactory actually needs."

Context: why a strict gate is justified

The lowerCamelCase + depth-limit approach is stricter than typical Java/Spring projects, which usually rely on bean binding to catch silent failures plus relaxed name matching (Spring Boot's Binder / ConfigurationPropertyName API tolerates kebab-case, snake_case, UPPER_CASE, and camelCase as equivalent forms; Quarkus and Micronaut behave similarly). The aggressive gate here is justified because java-tron uses Typesafe Config's ConfigBeanFactory, which uses java.beans.Introspector and requires the HOCON key to match the JavaBean property name exactly — no case normalization. Without a CI gate, non-conforming keys silently fail to bind and accumulate.

One sentence in the PR description acknowledging "stricter than typical Java projects, but necessary because ConfigBeanFactory does no case normalization" would help future contributors understand the choice.

Recommendation: extend this PR with Gates 2–5 as separate commits

Think of this PR as Gate 1. A new config key being "reasonable" actually has 5–6 independent dimensions. Suggest landing all of them in this PR with one commit per gate (rather than as follow-up PRs):

Gate Goal Catches Implementation
1 (this commit) Naming + depth Foo, some_key, depth > 5 Python + pyhocon
2 Bean ↔ key parity + default drift key has no bean field, field has no key, Java default ≠ reference.conf default Python + javalang or Gradle JUnit using Class.getDeclaredFields() + ConfigFactory.defaultReference()
3 Binding chain completeness bean field exists but applyXxxConfig doesn't assign to PARAMETER, or no production consumer reads getXxx() Java AST or Gradle JUnit reflection
4 Inline comment coverage Leaf keys without #/// documentation Python + pyhocon + raw text scan (carry-forward inline comments)
5 XxxConfigTest coverage Bean field present but no test assertion references it Gradle JUnit (reflection on test bytecode)

(Gate 6 — deprecation policy via git history scan — is qualitatively different and lower priority; defer.)

Why one PR + multi-commit rather than 5 follow-up PRs:

  1. Shared infrastructure — Gates 1/2/4 all use pyhocon, share GHA annotation emission, live in .github/scripts/. Keeping them together avoids cross-PR refactoring.
  2. Coherent design surface — reviewer can spot rules that interact (e.g. comment-coverage gate's line-attribution must match how naming gate walks segments).
  3. Avoids the "follow-up never happens" trap — multi-PR roadmaps in OSS projects often stall after Gate 1 lands.
  4. Independent rollback preservedgit revert <gate-N-commit-sha> is as clean as reverting a standalone PR.
  5. Project velocity context — the config series (refactor(config): simplify applyEventConfig (follow-up to #6615) #6735, fix(config): optimizes config binding for node  #6755, refactor(config): merge test config files #6788, refactor(config): overhaul config docs, fix defaults, remove dead params #6790, refactor(config): remove unused storage index and json parsing config #6794, this PR) is concentrated in a few contributors; one focused PR avoids ×5 scoping/CI/review overhead.

Why Gate 2 is the highest-leverage follow-up: it's exactly the gate that would have caught the recent class of issues this series has been fixing manually:

Suggested commit layout:

1. feat(ci): add reference.conf naming + depth gate           ← current PR content
2. feat(ci): add bean ↔ key parity + default drift gate       ← Gate 2 (highest ROI)
3. feat(ci): add config binding chain completeness gate
4. feat(ci): add inline comment coverage gate
5. feat(ci): add XxxConfigTest field coverage gate

For Gate 2 implementation, Gradle JUnit using Java reflection is likely simpler than Python + AST parsingClass.getDeclaredFields() walks bean fields naturally, no Java AST parser needed. ~4–6 hr starting cost.

TL;DR

  • ✅ Approve in principle — Gate 1 is solid, regex/depth logic verified empirically (296 keys, max depth 5)
  • 🟠 Fix 2 small items before merge — add shutdown.BlockTime/Height/Count to ALLOWLIST; tighten regex description in script
  • 📋 Suggest extending to all 5 gates in this PR as separate commits, rather than 5 follow-up PRs — shared infrastructure, coherent design surface, avoids the "follow-up never happens" trap

@bladehan1
Copy link
Copy Markdown
Collaborator Author

Two changes have been implemented.
Agreed that Gate 2 is the highest ROI going forward, but this PR should not be merged. The scope of Gate 2 (which packages count as config beans, cross-type default value comparison, Args.java boundary, toolchain selection) is substantial enough for its own PR.

Gates 3/4/5 overlap with Gate 2.Gate 2's scope must be clarified first before we can determine whether 3/4/5 still stand independently

@317787106
Copy link
Copy Markdown
Collaborator

Good gate — the naming-convention and depth ceiling cover two of the most common drift vectors. A few additional constraints worth considering for this script or follow-up PRs, roughly in order of implementation cost:

1. Duplicate-key detection

HOCON silently keeps the last value when the same key appears twice, making typo-overwrites invisible. A text-level scan (before ConfigFactory.parse_file) can surface these cheaply without pyhocon involvement:

import collections
seen = collections.Counter()
for line in path.read_text().splitlines():
    m = re.match(r'^\s*([\w.]+)\s*[={]', line)
    if m:
        seen[m.group(1)] += 1
duplicates = [k for k, c in seen.items() if c > 1]

2. Port-uniqueness check

reference.conf currently defines 10+ default ports. Two services sharing a port is a hard startup failure with a confusing error message. Extracting all *[Pp]ort leaf values and asserting they are distinct costs ~5 lines and
would prevent port-collision regressions from landing silently.

3. No non-empty default for sensitive keys

Keys whose path contains password, secret, privateKey, or token should default to "" or be absent. A regex scan over leaf values would catch any accidental weak-credential default before it reaches a release tag.

4. Numeric-range sanity

Many leaf types have well-known valid ranges implied by their names:

┌────────────────────────────────┬────────────────┐
│            Pattern             │ Expected range │
├────────────────────────────────┼────────────────┤
│ *[Pp]ort                       │ 1024–65535     │
├────────────────────────────────┼────────────────┤
│ *[Tt]imeout / *[Ii]nterval     │ > 0            │
├────────────────────────────────┼────────────────┤
│ *[Ss]ize / *[Cc]ount / *[Nn]um │ ≥ 0            │
└────────────────────────────────┴────────────────┘

These can be checked with a small name→constraint dispatch table using isinstance(value, int) after pyhocon parsing.

5. Bean-field ↔ key parity (the "Gate 2" mentioned in the PR description)

This is the highest-leverage follow-up: reflect each *Config class to enumerate declared fields and cross-check against reference.conf keys in both directions — orphan keys (in conf, no field) and unbound fields (field
exists, no default in conf). This is what would have caught indexDirectory and maxNestingDepth at PR time rather than requiring a manual audit like #6794. The implementation is more involved (needs javap or AST parsing), but
the --list-keys output from this script is a natural input for it — worth adding a --list-keys flag now to keep the two gates composable.

@bladehan1
Copy link
Copy Markdown
Collaborator Author

bladehan1 commented May 25, 2026

Thanks for the thorough list. Status by item:

1 Duplicate-key detection — declined.

Duplicate keys in a well-formatted reference.conf are visible in code review (the file has section ordering, and the header documents it). pyhocon exposes no API for it, so the only path is a hand-rolled text scanner — exactly the trap this PR paid for once already: the original homemade scanner mis-handled key = { ... } blocks and under-counted by 17 keys / 1 depth level, which is why we switched to pyhocon. The proposed regex would also stumble on dotted-form keys inside {} blocks, += appends, // line comments, and triple-quoted strings — silent false-positive or miss in each case. Maintaining a second parallel scanner doesn't pay for a class of bug already visible to reviewers.

2 Port-uniqueness check — implemented in this PR.

Added as rule 4 in check_reference_conf.py: leaves whose last segment is port or ends in Port must bind distinct integer values. Sentinels 0 (disabled) and -1 (auto/unset) are exempt. Paths containing [] are excluded — per-record fields (e.g. genesis-witness ports), not local-process bindings. Uses pyhocon's parsed tree, no extra deps.

3 Sensitive-default scan — not a fit for CI.

This is policy-shaped ("what counts as sensitive", "warn vs fail", "where the empty-placeholder allowlist lives"), not a config-grammar check. The decisions are release-engineering and business, not syntactic. CI gates work when the rule is deterministic from the file alone; this one isn't, and pushing it into CI just moves the policy decision into a regex with an ever-growing exemption list.

4 Numeric-range sanity — not a fit for CI either.

Range constraints belong with the bean that owns the value (e.g. fromConfig-side validation in *Config.java, or JSR-303-style annotations on the field), not with a regex over reference.conf. The file lists values; the bean knows their meaning. A pattern-dispatch table in CI separates the constraint from its owner and accumulates per-pattern exemptions (*[Pp]ort legitimately admits 0 / -1; *[Ss]ize may be normalized from bytes; etc.) until the gate's signal-to-noise is worse than a code review.

@bladehan1
Copy link
Copy Markdown
Collaborator Author

bladehan1 commented May 25, 2026

For Gate2
Recommendation: open a separate follow-up PR, not extend this one.
Using unit test constraints might be more appropriate.
the dead-key cleanup from #6794 — has a clear scope in *Config.java,
not CI. It has a hard prerequisite ordering:

With those merged, the follow-up has a smaller, cleaner scope.

For this PR: keep it scoped to Gate 1 (naming + depth + port-uniqueness)

bladehan1 added 2 commits May 25, 2026 14:50
Adds a Python-based CI gate (.github/scripts/check_reference_conf.py)
that parses common/src/main/resources/reference.conf via pyhocon and
enforces two rules on every key path:

  1. Each segment must start with a lowercase ASCII letter and contain
     only ASCII letters/digits. Acronym runs after position 1 (e.g.
     httpPBFTEnable) are accepted: this matches what java.beans.Introspector
     tolerates, which is what ConfigBeanFactory actually requires.

  2. Maximum nesting depth of 5 segments.

ALLOWLIST contains documented PascalCase exceptions handled via manual
reads in *Config.fromConfig: node.http.PBFTEnable / PBFTPort,
node.rpc.PBFTEnable / PBFTPort, and node.shutdown.BlockTime / BlockHeight /
BlockCount (the last three are currently commented out in reference.conf;
allowlisted to prevent CI breakage if ever uncommented with defaults).

pyhocon version is pinned in .github/scripts/requirements.txt; pip cache
is enabled in pr-check.yml.

Without a CI gate, non-conforming keys silently fail to bind under
ConfigBeanFactory and accumulate. Spring Boot / Quarkus / Micronaut
tolerate kebab-case / snake_case / UPPER_CASE via case normalization;
Typesafe Config's ConfigBeanFactory does not, so the stricter gate is
the right fit for java-tron.
Add rule 4 to the reference.conf gate: leaves whose last segment is `port`
or ends in `Port` must bind distinct integer values across the file.
Sentinel values 0 and -1 are exempt (disabled / auto-unset). Paths
containing `[]` are excluded because list-element ports belong to
per-record fields (e.g. genesis witnesses), not to the local process.

The check reuses the already-parsed pyhocon ConfigTree — no additional
parsing, no new dependencies. Violations are surfaced via the existing
consolidated GHA `::error` annotation alongside format/depth violations.
@bladehan1 bladehan1 force-pushed the ci/reference-conf-gate branch from f1747ac to d7ccca0 Compare May 25, 2026 06:52
@Sunny6889
Copy link
Copy Markdown
Contributor

[Should] port uniqueness check is undocumented in the PR description

The PR description lists three rules (lowerCamelCase, MAX_DEPTH, ALLOWLIST), but the script also enforces a fourth: service-binding port values must be unique (find_port_collisions). That's a meaningful additional behavior — reviewers reading the title "lowerCamelCase and max depth" won't expect a port-value lint to run.

[Discussion] — local fixture suite isn't committed

The PR mentions a "Local fixture suite (not committed) covering: empty / simple / dotted-form / at-max-depth / object array / scalar array / allowlist / format violations / depth violations / combined + dedup invariant / env errors."

Could we commit these as .github/scripts/test_check_reference_conf.py (or under tests/)? Without them, a future change to walk(), _is_port_segment, or the leaf-detection logic could silently regress. CI hardening scripts deserve the same regression coverage as the code they protect.

@bladehan1 bladehan1 changed the title feat(ci): enforce lowerCamelCase and max depth in reference.conf feat(ci): enforce reference.conf naming, depth, and service-port uniqueness May 25, 2026
@bladehan1 bladehan1 changed the title feat(ci): enforce reference.conf naming, depth, and service-port uniqueness feat(ci): enforce reference.conf check May 25, 2026
@bladehan1 bladehan1 changed the title feat(ci): enforce reference.conf check feat(ci): enforce reference.conf CI check May 25, 2026
@bladehan1
Copy link
Copy Markdown
Collaborator Author

Thanks for both items.

1 — Port-uniqueness rule now in the description. Promoted port-uniqueness to a top-level rule in the PR body (alongside lowerCamelCase, depth ≤ 5, and ALLOWLIST). The fixture list in the testing section also expanded to include port-collision cases (same value / sentinel-exempt / array-path-exempt).

2 — Committed test suite: declined for this PR, with reasoning.

The ask is reasonable in principle, but committing one here would set a precedent the project hasn't decided on. Current state of CI scaffolding:

.github/scripts/
├── check_reference_conf.py
└── requirements.txt

This PR's script is the only file under .github/scripts/, and the repo has zero test_*.py files anywhere — none of the existing workflow scripts have inline test coverage either. So this PR would simultaneously be:

  • adding a new CI gate, AND
  • introducing a project-wide "CI scripts should be tested" convention via a single 240-LOC file.

The second decision deserves its own discussion rather than landing implicitly here. Reasons not to bundle:

  • Failure mode is bounded. A regression in the gate means "CI misses a violating key", not "production breaks". Caught at the next manual review of reference.conf, or when a downstream ConfigBeanFactory exception surfaces at process startup.
  • Script audit is cheap. 240 LOC of straight-line Python; a reviewer modifying walk() or _is_port_segment can run the gate against reference.conf locally (which is how the fixture set listed in the PR body was validated) without test-runner setup.
  • Right time to add is when scope grows. If/when Gate 2 lands or the rule set doubles, the per-rule complexity will justify the precedent. At that point a single tests/scripts/ (or similar) directory covering all .github/scripts/ files makes more sense than retrofitting one ad-hoc test file.

Happy to revisit as a follow-up PR if the project wants to formalize "CI scripts under test" as a convention. For now the gate's regression coverage is the documented fixture suite + the gate itself running on every PR, which makes any silent regression visible the next time reference.conf is touched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants